Knowledge Distillation with Feature Self Attention
نویسندگان
چکیده
With the rapid development of deep learning technology, size and performance network continuously grow, making compression essential for commercial applications. In this paper, we propose a Feature Self Attention (FSA) module that extracts correlation information between hidden features new method distilling to compress model. FSA does not require special or match teacher model student By removing multi-head structure repeated self-attention blocks in existing mechanism, it minimizes addition parameters. Based on ResNet-18, 34, added parameters are only 2.00M training speed is also fastest comparison benchmark models. It was demonstrated through experiments use interrelationship loss can be beneficial models, indicating importance considering neural compression. And verified from scratch vanilla without pre-trained weight
منابع مشابه
Topic Distillation with Knowledge Agents
This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...
متن کاملSequence-Level Knowledge Distillation
Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...
متن کاملThe relationship between Mindfulness and Attention with Academic Self-efficacy
The purpose of this research was to study the relationship between Mindfulness and Attention with Academic Self-efficacy in high school students In the 95-94 academic year in the Lordegan city. This study was Correlation- descriptive. The statistical population has consisted of all high school students, That the number of 2814 people been. Three hundred high school students from Lordegan were c...
متن کاملLearning Loss for Knowledge Distillation with Conditional Adversarial Networks
There is an increasing interest on accelerating neural networks for real-time applications. We study the studentteacher strategy, in which a small and fast student network is trained with the auxiliary information provided by a large and accurate teacher network. We use conditional adversarial networks to learn the loss function to transfer knowledge from teacher to student. The proposed method...
متن کاملLearning Efficient Object Detection Models with Knowledge Distillation
Despite significant accuracy improvement in convolutional neural networks (CNN) based object detectors, they often require prohibitive runtimes to process an image for real-time applications. State-of-the-art models often use very deep networks with a large number of floating point operations. Efforts such as model compression learn compact models with fewer number of parameters, but with much ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3265382